A Simple Linear Time (1+έ)-Approximation Algorithm for k-Means Clustering in Any Dimensions
نویسندگان
چکیده
We present the first linear time (1+ε)-approximation algorithm for the k-means problem for fixed k and ε. Our algorithm runs in O(nd) time, which is linear in the size of the input. Another feature of our algorithm is its simplicity – the only technique involved is random sampling.
منابع مشابه
Linear-Time Approximation Schemes for Clustering Problems
We present a general approach for designing approximation algorithms for a fundamental class of geometric clustering problems in arbitrary dimensions. More specifically, our approach leads to simple randomized algorithms for the k-means, k-median and discrete k-means problems that yield (1 + ε) approximations with probability ≥ 1/2 and running times of O(2(k/ε)O(1)dn). These are the first algor...
متن کاملLinear Time Algorithms for Clustering Problems in Any Dimensions
We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2 O(1) dn) time (1 + ε)-approximation algorithms fo...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملScalable constant k-means approximation via heuristics on well-clusterable data
We present a simple heuristic clustering procedure, with running time independent of the data size, that combines random sampling with Single-Linkage (Kruskal’s algorithm), and show that with sufficient probability, it has a constant approximation guarantee with respect to the optimal k-means cost, provided an optimal solution satisfies a center-separability assumption. As the separation increa...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004